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1. Introduction and Main Results 

Let X\, . . . , X n be a sample from a decreasing density /n on (0, oo), and let 
f n denote the Grenander estimator (i.e. the maximum likelihood estimator) of 
fo- Thus f n = f% is the left derivative of t he least con c ave ma jor ant F n of the 



Grenander (1956 



30), 



iGroeneboom 



empir ical d istribution func tion ¥ n ; see e.g. 
11983 ). and bevrovej Jl2S2l, chapter 8). 

The Grenander estimator f n is a uniformly consistent estimator of fo on sets 
bounded away from if fo is continuous: 



sup|/ n (x) 

x>c 



Hx)\ 







for each c > 0. It is also known that f n is consistent with respect to the L\ 

= f \p(x) — q(x)\dx) and Hellinger (h 2 (p, 
metrics: that is, 



Wfn - /o||l 







and 



j) = 2 1 J \yp(x) - y/q{x)\ dx) 
h(f n ,fo) 



*a.s. 0, 



see e.g. 



Devrovd (| 19871 . Theorem 8.3, page 144) and Ivan de Geerl (|1993l ). 
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However, it is also known that / n (0) = / n (0+) is an inconsistent esti- 
mator of fo(0) = /p(0+) = ^^x\ofo(x), even when /o(0) < 00 • hi f&ct, 



Woodroofe and Sun 



(1.1) 



([1993) showed that 
/n(0) 



d / o (0)sup— — = /o(0) — 
t>0 t u 



as n — > 00 where N is a stan dard Poisson process on [0, 00) and U ~ Uniform(0, 1) 



Woodroofe and Sun 



(|1993h introd uced penalized estima t ors f n of /o which yield 
consistency at 0: / n (0) — > p /o(0). iKulikov and Lopuhaa ( 2006 ) study estimation 
of /o(0) based on the Grenander estimator f n evaluated at points of the form t = 
cn" 1 . Among other things, they show that f n {n~ 1 ^) — > p /o(0) if |/o(0+)| > 0. 

Our view in this paper is that the inconsistency of / n (0) as an estimator of 
/o(0) exhibited in (jl.ip can be regarded as a simple consequence of the fact that 
the class of all monotone decreasing densities on (0, 00) includes many densities 
/ which are unbounded at 0, so that /(0) = 00, and the Grenander estimator 
f n simply has difficulty deciding which is true, even when /o(0) < 00. From this 
perspective we would like to have answers to the following three questions under 
some reasonable hypotheses concerning the growth of fo(x) as x \ 0: 

Ql: How fast does / n (0) diverge as n — > 00? 

Q2: Do the stochastic processes {b n f n (a n t) : < t < c} converge for some 

sequences a n , b n , and c > 0? 
Q3: What is the behavior of the relative error 

fn(x) 



sup 

0<2<C„ 



/o(x) 



for some constant c n ? 



It turns out that answers to questions Ql - Q3 are intimately related to the 
limiting b e havio r of the minimal order st a tistic X n -\ = min{ATi, . . . , X n }. By 
Gnedenkd (119431 ) or Ide Haan and Ferreiral (|2006l . Theorem 1.1.2, page 5)), it is 
well-known that there exists a sequence {a n } such that 

(1.2) a~ n x X n , x ^ d Y 

where Y has a nondegenerate limiting distribution G if and only if 

(1.3) nFo(a n x) — > x 7 , x > 0, 
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for some 7 > 0, and hence a n — > 0. One possible choice of a n is a n = F ~ 1 (l/n), 
but any sequence {a n } satisfying ni ? o(a n ) — ► 1 also works. Since Fq is concave 
the convergence in (jl.3p is uniform on any interval [0, AT]. Concavity of Fq and 
existence of fo also implies convergence of the derivative: 



(1.4) 



na n f (a n x) -> 72 



7-1 



c> 0. 



By iGnedenkd (| 19431 ). (|1.2p is equivalent to 

(1.5) lim ^ = C 7, 
Thus (|1.2j) . (jl.3p . and (jl.5p are equivalent. In this case we have: 

(1.6) G(x) = 1 -e~ x \ x>0. 

Since Fo is concave, the power 7 6 (0, 1]. 

As illustrations of our general result, we consider the following three hy- 
potheses on /q: 

GO: The density fo is bounded at zero: /o(0) < 00. 
Gl: For some j3 > and < C\ < 00, 

(log(l/x))- /3 / (x) -> Ci as x 

G2: For some < a < 1 and < C 2 < 00 

x a /o(x) 



0. 



Co 



3jS X 



0. 



Note that in G2 the value a = 1 is not possible for a positiv e limit C o, since 
xf(x) — > as x — > for any monotone density /; see e.g. Devroye (jl986l . 
Theorem 6.2, page 173). Below we assume that Fq satisfies the condition (|1.5p . 
Our cases GO and Gl correspond to 7 = 1 and G2 to 7 = 1 — a. 

One motivation for considering monotone densities which are unbounded 
at zero com es from the study of m ixture models. An example of this type, as 
discussed by lDonoho and Jinl (120041 ). is as follows. Suppose X\, . . . , X n are i.i.d. 
with distribution function F where, 

under Hq : F = <£, the standard normal d.f. 

under H x : F = (l-e)$ + e$(- -//), ee(0,l), p > 0. 
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If we transform to Yi = 1 — $(Xi) ~ G, then, for < y < 1, 

under Hq : G(y) = y, the Uniform(0, 1) d.f., 

under H l :G = G e>M (y) = (1 - e)j, + e(l - ^(^(l - y) - /x)). 

It is easily seen that the density g ttfJi of G ttfJi , given by 

is monotone decreasing on (0, 1) and is unbounded at zero. As we will show in 
Section^ G e ^ satisfies our key hypothesis (jl,5p below with 7 = 1. Moreover, 
we will show that the whole class of models of this type with $ replaced by the 
generalized Gaussian (or Subbotin) distribution, also satisfy (jl.5p . and hence the 
behavior of the Grenander estimator at zero gives information about the behavior 
of the contaminating component of the mixture model (in the transformed form) 
at zero. 

Another motivation for studying these questions in the monotone density 
framework is to gain insights for a study of the corresponding questions in the 
context of nonparametric estimation of a monotone spectral density. In that 
(related, but different) setting, singularities at the origin correspond to the in- 



e£ 



Cox 


(1984 


), 


Beran 


(1994 





Mai (|2002l ). Although our results here do not apply directly to the problem of 
nonparametric estimation of a monotone spectral density function, it seems plau- 
sible that similar results will hold in that setting; note that when / is a spectral 
density, the assumptions Gl and G2 correspond to long-memory processes (with 
the usual description being in terms of (3 = 1 — a £ (0, 1) or the Hurst coeffic ient 
H = 1 — (3/2 = 1 — (1 — a)/2 = (1 + a)/2). See Unevski and Soulier! (|20od ) for 
recent work on nonparametric estimation of a monotone spectral density. 

Let N denote the standard Poisson process on M + . When fjl .5 j) and hence 
also ([L6D hold, it follows from iMillerl (|1976l . Theorem 2.1, page 522) together 
with 

(1.7) 



Jacod and Shirvaevl (|2003l . Theorem 2.15(c) (ii), pages 306-307), that 
n¥ n {a n t) => N(t 7 ) in D[0, oo), 



which should be compared to (jl.3p . 
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Since we are studying the estimator /„ near zero and because the value 
of f n at zero is defined as the right limit lim x \Q f n (x) = / n (0), it is sensible to 
study instead the right-continuous modification of f n , and this of course coincides 
with the right derivative of the least concave majorant F n of the empirical 
distribution function F n . Therefore we change notation for the rest of this paper 
and write f n for f^ throughout the following. We write f~ for the left-continuous 
Grenander estimator. 

We now obtain the following theorem concerning the behavior of the Grenan- 
der estimator at zero. 



Theorem 1.1. Suppose that M.5\) holds. Let a n satisfy nFo(a n ) ~ 1, let h^ 
denote the right derivative of the least concave majorant of t i— > N(t 7 ), t > 0. 
Then: 

(i) na n f n (ta n ) ^ hj(t) in D[0,oo). 

(ii) For allc>0 



sup 

0<x<ca n 



fn(x) 



fo(x) 



1 



>d sup 

0<t<c 



1 



The behavior of /„ near zero under the different hypotheses GO, Gl, and 
G2 now follows as corollaries to Theorem 1.1. Let = h^(0). We then have 



(1. 



y 7 = sup(N(t 7 )/i) = sup(N(s)/s 1/7 ). 

i>0 s>0 

Here we note that Y\ =d 1/U where U ~ Uniform(0, 1) has distribution function 
H\{x) = 1 — 1/x for x > 1. The distribution of Y~ for 7 £ (0,1] is given in 
Pr opositio n 11.51 b elow. The fi rst part of the following corollary was established 
by 



Woodroofe and Sun 



Corollary 1.2. Suppose that GO holds. Then 7 = 1, a n l = n/o(0+) satisfies 
ni ? o(a n ) ~~ > 1; an d ^ follows that: 

(i) 

/n(0) -> d /o(0)£i(0) = /o(0)Yi. 
(ii) The processes {t 1— > f n {tn~ l ) : n > 1} satisfy 

fnitrT 1 ) ^/o(0)£i(/o(0)i) m D[0,oo). 
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(Hi) For c n = c/n with c > 0, 



sup 

0<x<c n 



fn(x) 



fo(x) 



1 



Y -1 



which has distribution function H\(x + 1) = 1 — + 1) for x > 0. 

Corollary 1.3. Suppose that Gl holds. Then Fq(x) ~ C\x{\og{\/x))^ , so 7 = 1, 

and a" 1 = Cin(logn) /3 satisfies nFo(a n ) — > 1. It follows that: 

(i) 



/n(0) 



(log n)^ 3 

fn) T/ie processes {t 1— > (logn) _ ^/ n (i/(n(logn) /3 )) : n > 1} satisfy 
1 / * 



(logra)' 3 \n(logn)^ 
(^m) For c n = c/(n(logn)^) tyit/i c > ; 

/nO) - 



Ci/ii(Cit) in D[0,oo) 



sup 

0<x<c T , 



/o(x) 



y - 1. 



Corollary 1.4. Suppose that G2 ZioWs and sei C 2 = (C 2 /(l - a)) 1/(1 ~ a) . T/ien 
Fo(x) ~ C , 2X 1 ~ a /(l — a), so 7 = 1 — a, a" 1 = C^n 1 ^ 1-0 ** satisfies nFo(a n ) — > 1, 
and zi follows that: 

(i) 

(^) ^) - ™- 

(wj T/ie processes {t 1— > n _a /( 1 ~ a - ) / n (in _1 / ( ' 1 ~ a * ) ) : n > 1} satisfy 



n a/(l-a) 

(Hi) For c n = c/re 1 ^ 1 "") u>i£a c > 0, 



sup 

0<x<c„ 



/o(x) 



1 



C 2 h 1 - a {C 2 t) in D[0,oo). 
t a hi- a (t) 



> d sup_^ 

0<t<cC 2 



1 — a 



Taking /3 = in (i) of Corollary II. 3l yields the limit theorem (jl.ip of 



Woodroofe and Sun 



(|1993l ) as a corollary; in this case Cj = /o(0). Similarly, taking a = in (ii) of 
Corollary 11,41 yields the limit theorem (jl.ip of IWoodroofe and Sunl (|1993l ) as a 
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corollary; in this case C2 = /o(0). Note that Theorem 11.11 yields further corollar- 
ies when assumptions Gl and G2 are modified by other slowly varying functions. 

Recall the definition (jl.8p of Ky. The following proposition gives the distri- 
bution of Y 7 for 7 € (0, 1]. 

Proposition 1.5. For fixed < 7 < 1 and x > 0, 

PjLMUXl 1 - 1 /^ i/7 = l, x>1, 

r WoU 1/7 J" / \ l-E^iOfc(^7), */7<l, *>0, 

where the sequence {a^(x, 7)}fc>i is constructed recursively as follows: 
01 (x, 7) =p ( ;1 



anc?, /or j > 1, 

fe-i 



a*(x, 7 )=p((^) ;*)-E{*(*>7)-p((£) "(£) !*-<)} 



i=l 

where p(m; k) = e~ m m k /k\. 




Figure 1. The distribution functions of F 7 , 7 e {0.2, 0.4, 0.6, 0.8, 1.0}. 
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Remark 1.6. The random variables are increasingly heavy-tailed as 7 de- 
creases; cf. FigureUi Let T\ : T%, ■ ■ ■ be the event times of the Poisson process N; 
i.e. N(i) = Y^jjLi l[T,<t]- Then note that 

d j 1 



Y, 



sup ■ 

i>l T, 



1/7 



> 



.1 



1/7 



where T\ ~ Exponential{\) . On the other hand 



Yy 



sup ■ 

i>0 



N(i)' 



1/7 



< 



sup ■ 

t>0 



N(t) 



1 



where U ~ Uniform(0, 1). 77ms it is easily seen that E(Y^) < 00 if and only if 
r < 7, and i/iaf £/ie distribution function of Y^ is bounded above and below by 
the distribution functions and G^ 0/l/T^ 7 and l/U 1 ^, respectively. 

The proofs of the above results appear in Appendix A. They rely heavily 
on a set equality known as the "switching relation". We study this relation 
using convex analysis in Section [2j Section [3] gives some numerical results which 
accompany the results presented here, and Section H] studies applications to the 
estimation of mixture models. 



2. Switching relations 

In this section w e consider sev e ral ge neral variants of the so-called switching 
relation fi rst given in iGroenebooml (1198 5). and use d repeatedly by other authors, 
including Kulikov and Lopuhaa ( 20051 . 2006 ). and van der Vaart and Wellnei ( 19961). 



van der Vaart and van der Laan 



Other versions of the switching relation were also studied by ; 
(2006, Lemma 4.1). In particular, we provide a novel proof of the result using 
convex analysis. This approach also allows us to re-state the relation without re- 
stricting the domain to compact interv als. Throug h out th i s section we make use 



of def in itions from c o nvex analysis (cf. iRockafellaii (|1970l ) 



(199 



Rockafellar and Wets 



Bovd et al. I (|2004h ) which are given in Appendix B. 
Suppose that $ is a function, $ : D — > R, defined on the (possibly infinite) 
closed interval D C R. The least concave majorant $ of $ is the pointwise 
infimum of all closed concave functions g : D — > R with g > <I>. Since <I> is 
concave, it is continuous on D°, the interior of D. Furthermore, $ has left 
and right derivatives on D°, and is differentiable with the exception of at most 
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countably many points. Let and 4>r denote the left and right derivatives, 
respectively, of $. 

If $ is upper semicontinuous, then so is the function & y (x) = <&(x) — yx for 
each y € M. If D is compact, then Q y attains a maximum on D, and the set 
of points achieving the maximum is clo sed. Compactness of D was assumed by 
van der Vaart and van der Laanl (|2006l . see their Lemma 4.1, page 24). One of 
our goals here is to relax this assumption. 

Assuming they are defined, we consider the argmax functions 



n L (y) = argmax 1 ^ 



KR(y) = argmax^^ 



= argmax^ {^(a?) — yx} 

= inf{x € D : $ y (x) = 

= argmax^ {&(x) — yx} 

= sup{x E D : $y(x) = 



SUP $y{z)}, 



SUP $y{z)}. 

zeD 



Theorem 2.1. Suppose that & is a proper upper- semicontinuous real-valued 
function defined on a closed subset D C R. Then <3? is proper if and only if 
<I> < I for some linear function I on D. Furthermore, if conv(hypo(&)) is closed, 
then the functions kl and kr are well defined and the following two switching 
relations hold: for x G D and j/SK, 

SI: 4>l(x) < y if and only if nn(y) < x. 
S2: 4>r{x) <y if and only if K L (y) < x. 

When <3? is the empirical distribution function F n as in Section [H then $ = F n 
is the least concave majorant of F n , and 4>L = fn the Grenander estimator as 
defined in Section [TJ while 4>r = f n = f^ is the right continuous version of the 
estimator. In this situation the argmax functions kr,kl correspond to 



sup{x > : ¥ n (x) 
inf{x > : F n (x) 



- yx = sup(F n (z) - yz)}, 

z>0 

yx = sup(F n (z) - yz)}. 

z>0 



The switching relation given by iGroenebooml (|1985l ) says that with probability 
one 



(2.10) 



{#(*) <y} = {s*(y) < x} 
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van der Vaart and Wellnerl (119961. page 2 96), say that (12.101) holds for every x and 
y; see also iKulikov and Lopuhaal (|2005l . page 2229), and iKulikov and Lopuhaa 



20061 . page 744). The advantage of (|2.10p is immediate: the MLE is related to 



a continuous map of a process whose behavior is well-understood. 

The following corollary gives the conclusion of Theorem 12.11 when $ is the 
empirical distribution function F n . 

Corollary 2.2. Let F n be the least concave majorant of the empirical distribution 
function ¥ n , and let /„ and /„ denote its left and right derivatives respectively. 
Then: 



(2.11) 
(2.12) 



{fn(x) <y} = {^(y) < x], 

<y} = {s^y) < x}. 



The following example shows, however, that the set identity (|2.10p can fail. 



Example 2.3. Suppose that we observe (Xi, X2, X3) 
f£ is given by 



(1,2,4). Then the MLE 



1/3, 
1/6, 
0, 



< x < 2, 
2 < x < 4, 
4 < x < 00. 



The process is given by 



l (y) 



4, 0<y<l/6, 
2, l/6<y<l/3, 
0, 1/3 < y < 00. 



Note that 112. 1 0\) fails if x = 4 and < y < 1/6, since in this case f^{x) = 
fn(4) = 1/6 and the event {f^(x) < y} fails to hold while s^(y) = 4 and the 
event {sj(y) < x} holds. However, \2.11\) does hold: with x = 4 and < y < 1/6, 
both of the events {fni x ) < y} and {s^iv) < x } fail to hold. Some checking shows 
that \2.11\) as well as 12.12\) hold for all other values of x and y. 

Our proof of Theorem 12.11 will be based on the following proposit ion which is 
a cons eque nce of general facts con c ernin g convex functions as given in 
(|197ol ) and lRockafellar and Wets! (|1998l ). 



Rockafellar 
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Proposition 2.4. Let h be a closed proper convex function on R, and let f be 

its conjugate, 

f(y) = sup{yx - h(x)}. 

x<=R 

Let h'_ and h' + be the left and right derivatives of h, and define functions S- and 
s+ by 



yx - h(x) = f(y)}, 
yx - h(x) = f{y)}. 



(2.13) s -(y) = inf{x € 1 

(2.14) s+(y) = sup{x G 

Then the following set identities hold: 

(2.15) {(x,y) : h'_(x) <y} = {(x,y) : s + (y) > x}, 

(2.16) {(x,y) ■ h' + (x) < y} = {(x,y) : s_(y) > x}, 



Proof. All the references in this proof are to iRockafellarl (|1970l ). By Theorem 
24.3 (page 232) the set T = {{x,y) el 2 : y € dh(x)} (i.e. the graph of dh), is a 
maximal complete non-decreasing curve. By Theorem 23.5, page 218, the closed 
proper convex function h and its conjugate / satisfy 

K x ) + f(y) > x y 

and equality holds if and only if y G dh(x), or equivalently if x G df(y) where dh 
and df denote the subdifferentials of h and / respectively (see page 215). Thus 
we also have: 

T = {(x,y) £K 2 : x £ df(x)}, 
and, by the definitions of s_ and s + , 

r = {{x,y) : s_(y) < x < s+(y)}. 

By Theorem 24.1 (page 227) the curve T is defined by the left and right derivatives 
of h: 

(2.17) T = {(x,y) : h'_(x) <y< h' + (x)}. 

Using the dual representation we obtain: 

(2-18) r = {(x,y) : fL(y)<x<f' + (y)}, 
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therefore s_ = f'_ and s+ = f' + . Moreover, the functions h'_ and f'_ are left- 
continuous, the functions h' + and f' + are right continuous, and all of these func- 
tions are nondecreasing. 

From (pTTTj) and (f2"TT5j) it follows that: 

{hL(x)<y} = {f' + (y)>x}, 

which implies (|2. 15 j) . Since the functions h and / are conjugate to each other, 
the relations between them are symmetric. Thus we have 

{fL(y)<x} = {h' + (x)>y}, 

or equivalently 

{f'_(y)>x} = {h' + (x)<y}, 
which implies (|2.16p . □ 

Before proving Theorem 12.11 we need the following two lemmas. 

Lemma 2.5. Let S = argmax^ and S = argmax^, <3? be the maximal superlevel 
sets of & and Then the set S is defined if and only if the set S is defined and 
in this case conv(S') C S. 

Lemma 2.6. If conv(hypo( < I ) )) is a closed convex set then conv(S') = S. 

Proof of Lemma 12.51 Since cl(3>) < <3? the set S is defined if S is defined. On 
the other hand, if S is defined then <I> is bounded from above on D. Since: 

sup <3? = sup 

D D 

the function $ is also bounded from above on D, i.e. the set S is defined. 

By (I2.19P we have S C S. Since $ and $ are upper semicontinuous the sets 
S and S are closed. Since S is convex we have conv(5) C S. □ 

Proof of Lemma 12. 61 Indeed, we have conv(hypo(<I>)) = conv(cl(hypo( < I ) ))), 
and 

conv(hypo(<£)) C hypo($). 

Therefore conv(hypo( ( I ) )) is a hypograph of some closed concave function H such 
that: 

$ < H < 8. 
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Thus H = The set S is a face of hypo($) an d the set c o uv(S) is a face 
of conv(hypo(<]?)). The statement now follows from iRockafellarl (|1970l . Theorem 
18.3, page 165). □ 

Proof of Theorem 12.11 To prove the first statement, first suppose $ is proper. 
We have: 

(2.19) hypo(<I>) C hypo(cl($)) = cl(hypo($)) C cl(conv(hypo(<I>))) = hypo(<I>) 

and therefore hypo($) is bounded by any support plane of hypo( < l ) ). This implies 
that there exists a linear function I such that 3> < Z. 

Now suppose that there exists a linear function I such that <J> < I on D. 
Then cl($) < I and from (12.191) we have: 

hypo($) C hypo(Z), 

conv(hypo(<I>)) C hypo(Z), 

hypo(<I>) = cl(conv(hypo(<I>))) C hypo(Z). 

Thus < +oo on D. Since hypo(<I > ) C hypo(<I ) ) there exists a finite point in 
hypo($). 

To show that the two switching relations hold, first consider the convex 
function h = Then 

4>l{x) = -h'_(x), 
4>r(x) = -h' + {x), 
kl(v) = s-(-y), 
K R(y) = s+(-y), 

and by the properness of $ proved above and Proposition 12.41 h suffices to show 
that 

argmax^ ($(x) — yx) = argmax^ (<J>(a;) — yx), 
argmaXj. ($(z) — yx) = argmax^ (<£(x) — yx). 

To accomplish this, it suffices, without loss of generality, to prove the equalities in 
the last display when y = 0, and this in turn will follow if we relate the maximal 
superlevel sets of $ and <£. This follows from Lemmas 12.51 and 12.61 □ 
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Remark 2.7. Note that conv(S') ^ S in general. To see this, consider the 
function $ defined on R as follows: 



&(x) 



Xy^O 

1 x = 0. 



We have that <3? is upper- semicontinuous, S = {0} and $ = 1, so S 



Rem ark 2.8. Note th at i/conv(hypo(<I ) )) is a polyhedral set, then it is closed (see 



e.g. 



Rockafellan u 97 A . Corollary 19.1.2)). This is the case in our applications. 



3. Some Numerical Results 

Figure [2] gives plots of the empirical distributions of m = 10000 Monte Carlo 
samples from the distributions of f n (0)/(C2n a /(l — a)) 1 ^ 1-0 )) when n = 200 
and n = 500, together with the limiting distribution function obtained in 
The true density /q on the right side in Figure [2] is 



(3-20) f (x) = -l [0y ](x)—--exp(-y)dy; 

Jo V r(c) 

For c € (0, 1), this family satisfies (G2) with a = 1 — c and C2 = l/(aT(l — a)). 
(Note that for c = 1, fo(x) ~ log(l/x) as x \ 0.) 

The true density /o on the left side in Figure [2] is 

(3.21) f (x) = -— i -.x-^l-x)^,!]^); 

Beta(l — a, 2) 1 

For a S [0, 1), this family satisfies (G2) with a = a and C2 = 1/Beta(l — a, 2). 
Figure [3] shows simulations of the limiting distribution 

(3.22) sup t 1 ~Vi(t)/'y - 1 

0<t<c 

for different values of c and 7. Recall that if 7 = 1 the supremum occurs at t = 
regardless of the value of c, and the limiting distribution (13. 22ft has cumulative 
distribution function 1 — l/(x + 1). However, for 7 < 1, the distribution of 
(|3.22p depends both on 7 and on c, although the dependence on c is not visually 
prominent in Figure El Table [T] shows estimated values of 

(3.23) P ( sup |t 1_ TS(t)/7 - 1\ = 1 J 

\0<i<c / 
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n=200 
n=500 
n=infinity 



n=200 
n=500 
n=infinity 



20 40 



i=200 
1=500 
i=infinity 



1=200 
1=500 
i=infinity 




1-200 
1=500 
i=infinity 



n=200 
n=500 
n=infinity 



Figure 2. Empirical distributions of the re-scaled MLE at 
zero when sampling from the Beta distribution (left) and the 
Gamma distribution (right): from top to bottom we have 
a = 0.2,0.5,0.8. 



for different c and 7 < 1, which clearly depends on the cutoff value c (upper 
bound on the standard deviation in each case is 0.016). Note that (|3,22p is equal 
to one if the location of the supremum occurs at t = (with probability one) . 
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FIGURE 3. Empirical distributions of the supremum mea- 
sure: the cutoff values shown are c = 5 (top left), c = 25 (top 
right), c = 100 (bottom left), c = 1000 (bottom right). 

Table 1. Simulation of (13.231) for different values of 7 and c. 





c = 0.5 c = 5 c = 25 c = 100 c = 1000 


7 = 0.25 
7 = 0.50 
7 = 0.75 


0.361 0.171 0.140 0.092 0.06 
0.422 0.249 0.190 0.162 0.148 
0.489 0.387 0.349 0.358 0.367 



Cumulative distribution functions for the location of the supremum in (13.22H 
are shown in Figure 01 which clearly depend both on 7 and on c. 



4. Application to Mixtures 
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Figure 4. Empirical distributions of the location where 
the supremum occurs: from left to right we have 7 = 
0.25,0.50,0.75. Recall that for 7 = 1, the (non-unique) loca- 
tion of the supremum is always zero by Corollary 11.21 The 
data were re-scaled to lie within the interval [0, 1]. 

4.1. Behavior near zero. First, suppose that X±, . . . ,X n are i.i.d. with distri- 
bution function F where, 

under Hq : F = $ r , the generalized normal distribution 
under H x : F = (1 - e)$ r + e$ r (- - fi), e£(0,l), n > 0, 

where $ r (x) = f^. 00 ( Pr{y)dy with <j> r (y) = exp(— \y\ r /r)/C r for r > gives 
the generalized normal (or Subbotin) distribution; here C r = 2T(l/r)A 1 ^ r ^~ 1 
is the normalizing constant. If we transform to Yi = 1 — & r (Xi) ~ G, then, for 
< y < 1, 

under Hq : G(y) = y, the Uniform(0, 1) d.f., 

under £Ti : G{y) = G w (i/) = (1 - e)y + e(l - ^(^(l - j/) - /i)). 

Let 5e,M,r denote the density of G e ^ )T ; thus 

(4.24) ^(j,) = l- e + ee x P |-i(|$- 1 (l- 2 /)- y «r-|$- 1 (l-y)r)|. 

It is easily seen that g e ,^,r is monotone decreasing on (0, 1) and is unbounded at 
zero if r > 1. Figure [5] shows plots of these densities for e = .1, fj, = 1, and r € 
{1.0, 1.1, . . . , 2.0}. Note that g e ,^,i is bounded at 0: in fact g t ,n,i{y) = 1 — e + ee^ 
for < y < 2- 1 e-^. 
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Figure 5. Generalized Gaussian (or Subbotin) mixture den- 
sities with e= .1, fi= l,r e {1.0, 1.2, . . . , 2.0} (black to light 
grey, respectively) as given by (14.241) . 

Proposition 4.1. The distribution F^ >r (y) = 1 — <l> r ( ( J>,^r 1 (l — y) — n) is regularly 
varying at with exponent 1. That is, for any c > 0, 

lim F ^ Cy) = c, 



y^0+ F^ r (y) 



i.e. hi. 5^ holds with 7 = 1. 



Proof. Define K r (y) = $ r (1 — y). Our first goal will be to show that 



(4.25) 

where (for y small) 
k r (y) = 



K r (y) 
y-^b k r (y) 



lim 



1, 



-r log \ C r y < rlog 



1 



C r y 



(r-l)/r N 



1/r 



To prove (|4.25p . it is enough to show that 
(4.26) 1S - 
This result follows from 



lim k r (y) r (n r (y) - h r (y)) = 0. 
y->o 



de Haan and Ferreira 



(2006, Theorem 1.1.2). Define 



b n = k r (l/n), a n = l/b r n , 
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and choose F = <5 r in the statement of Theorem 1.1.2. Then, if we can show 
that 



(4.27) 



n(l - $ r (a n x + b n )) — ► logG(x) = e 



x £ 



de Haan and Ferreiral (|2006l . Theorem 1.1.2 and Section 1.1.2) 



it follows from 
that for all x G M 

lim ^/y)-&Li/«j =G -i (e -i/x ) = log(1/a;)> 

v ~*° a \i/vl 

where U(t) = (1/(1 - $r)) _1 (*) = *r *(1 ~ V*)- Choosing x = 1 yields (|426jh 
Therefore, we need to prove (|4.27p . 

To do this, we make use of the following, which is a generalization of Mills' 
ratio to the generalized Gaussian family 

4>r{z) 



(4.28) 



1 " ®r{z) 



,r-l 



as z-*co. 



The statement follows from l'Hopital's rule: 



lim 

z-^oo r (p r [z) 



-4> r {z) 



lim 

z^oo (1 - r)z-' r (j) r (z) + z 1 - r (/> r (z)(-z r - 1 ) 



1 



lim — 

z^oo 1 — (1 — r)z 



1. 



Now, 

n(l - $ r (a„a; + 6 n )) 



n 



4> r (a n x + fe w ) 
(a n x + in) 7- - 1 



exp 



C r bn 



exp 



-cxp 



(1 + a n x/6„) r - 1 



+ (r - 1) log b n - log n + log C r I I exp(-x 



— > exp(— 0) • exp(— x) 
by using the definition of b n . We have thus shown that (|4.25|) holds. 
Then, for y -» 0, by (14281) and <|4T55|> 

^]tt,r(y) = 1 - $r(«r(2/) - M) ~ 1 _ *r(«r(Z/) ~ /-0 



(«r(j/) - M) 



r-1 
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Plugging in the definition of (f) r , we find that 
l/C r ( R r (y) 



(k r (y) - n) r 1 

l/C r 

(K r (y) - n) 1 

l/Cr 



cxp 



1 



K r (y) 

T cx P <j (log (T V//) +log(rlog(l/(C P y)))) 

1 



k r (y) 



{K r (y) - fiY 



r log 



C r y 



1 1 _ M 



Note that \\m.y^k r {cy)/k T {y) = 1. Therefore, 



F »,r(cy) 

Fa Ay) 



■{C r y)\ 



Sr-(cy) | 



k r (cy) - fl 



r-1 



{ rl °^} 



-» c- 1 • 1 • 1 
Thus (fL5j) holds with 7 = 1. 



c. 



□ 



By the theory of regular variation (see e.g. iBingham et al.1 (|1989l . page 21)), 
this implies that F^^iy) = U^(y) where t is slowly varying at 0. It then follows 
easily that (|1.5p holds for Fq = G e ^, r with exponent 1. Thus our theory of 
Section 1 applies with a n of Theorem 1.1 taken to be a n = G e ^ n (l/n); i.e. 

- = Gzurifln) = (1 - e)a n + eFa, r (a n ) = £F»r(a, n ) 
n 

where the last approximation is valid for r > 1, but not for r = 1. When r = 1, 
the first equality can be solved explicitly, and we find: 

J 1 - $ r (<lY x (l - (l/(ne))) + fi), when r>l 
I n _1 (l — e + ee^) _1 , when r = 1. 

We conclude that Theorem 11.11 holds for a n as in the last display where f n is the 
Grenander estimator of g t ,)j,,r based on Yi, . . . , Y n . 

Another interesting mixture family to consider is as follows: suppose that 
$1, $2 are two fixed distribution functions: then 

under H : F = $1, 

under H x : F = (l-e)$i + e$ 2 , e G (0, 1). 



(4.29) 
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Using the transformation to Yj = 1 — $i(Xj) ~ G, then, for < y < 1 we find 
that under Hi the distribution of the Yj's is given by 

G(y) = (1 - e)y + e(l - $2(^(1 " v))), 
0i(*i (1-y)) 

For $2 given in terms of <3?i by the (Lehmann alternative) distribution function 
$2(2/) = 1 — (1 — <£>i(y)) 7 , this becomes 

G(y) = (l-e)yW, 
= (1 - e ) + e72/ 7_1 - 

When < 7 < 1 this family fits into the framework of our condition G2 with 
a = 1 — 7 and C2 = €7. 

4.2. Estimation of the contaminating density. Suppose that G £v p(y) = (1 — 
e)y + eF(y) where F is a concave distribution on [0, 1] with monotone decreasing 
density /. Thus the density g Et F of G €j f is given by g e ^p{y) = (1 — e)+e/(y). Note 
that g e< F is also monotone decreasing, and g e ^F{y) > 1 — e+e/(l) = 1 — e = g et p{l) 
if /(l) = 0. For e > we can write 

m = g ^ ) " (1 " £) . 

e 

If Yi, . . . , Y n are i.i.d. g e ^F then we can estimate g^F by the Grenander estimator 
g n , and we can estimate e by 

e n = 1 - <?„(1). 

This results in the following estimator f n of the contaminating density /: 

9n(y) - (1 - e n ) 5 n (y) - g n (l) 



/n(y) 



l-?n(l) 



which is quite similar in spirit to a setting studied by ISwanepoell (|1999l ). Here, 
however, we propose using the shape constraint of monotonicity, and hence the 
Grenander estimator, to estimate both e and /. We intend to study this estimator 
elsewhere. 
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Appendix A: Proofs for Section 1 

Before proving Theorem 11.11 we need the following two lemmas. The first 
lemma shows that the functionals argmax^ and argmax^ are both O p (l), while 
the second shows these are equivalent almost surely for the limiting Poisson pro- 

Ft 

cess. Together, these two lemmas will show that both functionals argmax and 
argmax^ are continuous. Below we assume that (jl.5p holds and that nFQ{a n ) ~ 1. 
Thus both {LSD and {LTD also hold. 

Lemma 5.2. (i) When 7 = 1 and x > 1, argmax^' ,R {n¥ n (a n v) — xv} = O p (l). 
(ii) When 7 € (0, 1) and x > 0, argmaXv ,R {n¥ n (a n v) — xv} = O p (l). 

Proof. It suffices to show that 

limsupP(sup{nF n (a ri ,u) — xv} > 0) — > 0, as K — > 00 

n—>oo v>K 

under the conditions specified. Let h(x) = x(logx— 1) + 1 and recall the inequality 

P(Bin(n,p)/(np) > t) < exp(—nph(t)) 
for t > 1 where Binfn, p) denotes a Binomial(n,p) random variable; see e.g. 



Shorack and Wellnerl (|1986l . inequality 10.3.2, page 415). It follows that 



P(sup{nF n (a n u) — xv} > 0) 

v>K 

= P(Uf =K {n¥ n (a n v) - xv > for some v G [j,j + 1)}) 

00 

< P(n¥ n (a n (j + 1)) - xj > 0) 

j=K 



nF n (a„(j + 1)) xj 



/v x nF (a n (j + 1)) " nF (a n (j + 1)) 



(5.30) < Eexp(-nFo(a n (i + l))^( raFo(a ^ + 1)) 
Next, since -Fo is concave, 

i + i 



nF (a n (j + 1)) < nF (a n (K + 1))- 



if + 1 

for j > K and nFo(a n (K + 1)) — > + l) 7 and n — > 00. Therefore, for all j > K 
and sufficiently large n, we have 

> 5{K + If-'J^- 

nF (a n (j + 1)) " V ; j + 1 
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for any fixed 5 < 1. We need to handle the two cases 7 = 1 and 7 < 1 separately. 
Note that if 7 < 1, then the above display shows that K,n can be chosen suffi- 
ciently large so that (xj)/nFo(a n (j + 1)) is uniformly large. On the other hand, if 
7 = 1 and x > 1 then we can pick 5, K, n large enough so that (xj)/nFo(a n (j + l)) 
is strictly greater than 1 + e for some e > 0, again uniformly in j. 

Suppose first that 7 < 1. Then for K, n large, since h{x) ~ x log x as x ^ 00, 
there exists a constant < C < 1 such that for all j > K 

■ f ' w ' +1|l i,w + i)) ) 2 ^(ttt) 

> C x (xj), 

for some other constant C x > 0. This shows that the sum in (|5.30p converges to 
zero as K — > 00, as required. 

Suppose next that 7 = 1. Note that the function h{x) > for x > 1. 
Therefore, combining our arguments above, we find that for all j > K 

nFo(a n (j + l))h(—-^——) > S (j + l)h(-—^f—-r) 
\nF (a n (j + 1))J \nF (a n (j + I)) J 

> C XtS (j + l), 

again for some C X) $ > 0. This again implies that the sum in (|5.30p converges to 
zero as K — > 00, and completes the proof. □ 

Lemma 5.3. Suppose that 7 G (0, 1]. Then 

V x = argmax^{N(w 7 ) — xv} = argmax^{N(w 7 ) — xv} = a.s. 

Proof. Suppose that V X L < V X R . Then it follows that N((V; L ) 7 ) - xV£ = 
N ((^x R ) 7 ) - vV*, or > equivalents 

N((V x R r) - N((V x L r) = x{V x R - V x }. 

Now (K R ) 7 , {V X L P G J(N) = {t > : N(t) - N(t-) > 1}, so the left side 
of the last display takes values in the set {1,2, . . .}, while the right side takes 
values in x ■ {r 1 / 7 — s 1 / 7 : r, s E J(N),r > s}. But it is well-known that all the 
(joint) distributions of the points in J(N) are absolutely continuous with respect 
to Lebesgue measure, and hence the equality in the last display holds only for 
sets with probability 0. □ 
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Proof of Theorem 11.11 We first prove convergence of the one-dimensional 
distributions of na n f n (a n t). Fix K > 0, and let x > l{ 7 =i} and t € (0,K]. By 
the switching relation (|2.12p . 

P{na n f n (a n t) < x) = P(s^(x/(na n )) < a n t) 

= P(argmaxf'{F„(s) - xs/(na n )} < a n t) 
= P(argmax^{F„(t> a n ) — x(v/n)} < t) 
= P(argmax^ {n¥ n (va n ) — xv} < t) 
— ► ^(argmax^N^ 7 ) - xv} < t) 
= P(h~<{t) < x) 

where the convergence follows from (|1.7p . and the argmax continuous mapping 
the orem fo r -D[0, oo) applied to the processes {v t— > n¥ n (va n ) — xv : v > 0}; see 



Fergerl ()2004l . Theorem 3 and Corollary 1). Note that Lemma 15.21 yields the 



e.g. 

O p (l) hypothesis of Ferger's Corollary 1, while Lemma 15.31 shows that equality 
holds in the limit conclusion. 

Convergence of the finite-dimensional distributions of h n {t) = na n f n (a n t) 
follows in the same way by using the process convergence in (jl.7p for finitely 
many values (ti,x±), . . . , (t m , x m ) where each t j £ M + and Xj > l/ 7= i}. 



To verify tightness of h n in D[0, oo) we use iBillingsleyl (119991 . Theorem 16.8). 



Thus, it is sufficient to show that for any K > 0, and any e > 

(5.31) lim limsupPl sup \h n (t)\>M] = 

M^oo n \0<t<K J 

(5.32) lim limsupP (ws t K(h n ) > e ) = 0, 

where wg t K(h) is the modulus of continuity in the Skorohod topology defined as 
u>6K(h) = inf max svp{\h(t) — h(s)\ : s,t E [U-i>U) H [0, K]\ , 

{U} r 0<i<r 

where {t{\ r is a partition of [0,K] such that = to < *i < ••• < t r = K 
and ti — ti-i > 5. Suppose then that h is a piecewise constant function with 
discontinuities occurring at the (ordered) points {rj}j>o- Then if S < infj \ri — 
Tj_i| we necessarily have that ws^xih) = 0. 
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First, note that since h n is non-increasing 

\\h 



t>n\\Q 



sup \h n (t)\ = hn(0), 

0<t<m 



and hence (|5,3ip follows from the finite-dimensional convergence proved above. 

Next, fix e > 0. Let = r n> o < Tn,i < ■ ■ < T Ut K n < K denote the (ordered) 
jump points of h n , and let = T n fi < T n> \ < ■ ■ ■ < T n j n < K denote the (again, 
ordered) jump points of n¥ n (a n t). Because {t^j, . . . , r njKn } C {T n> i, ... , T ntJn }, 
it follows that inf{rj jn — Tj_i jn } > inf{Tj jn — Tj_i jn } and hence 



P ( W S ,K( h n) > e ) < P ( j=1 inf j { T i,n ~ 2i_i >n } < 5 



Now, by p.7p and continuity of the inverse map (see e.g. I Whittl (|2002l . Theorem 
13.6.3, page 446)) 



,ry 7 ,o,o, 



' ■ ■ ■ ' j ' 

where T\, . . . ,Tj denote the successive arrival times on [0, K\ of a standard Pois 
son process. Thus, 



lim P ( inf {T} h - T} h A < s] = 0. 
5^0 \i=l,...,J 1 41 J 



and therefore (|5.32p holds. This completes the proof of (i). 
Now we prove (ii): Fix < c < oo. We first write 



(5.33) 



sup 

0<x<ca„ 



fn(x) 



fo(x) 



sup 

0<t<c 



na n f n (ta n ) 



na n f (ta r . 



Suppose we could show that the ratio process na n f n (a n t)/na n fo(a n t) converges 
to the process £ 1-7 /i 7 (i)/7 in D[0, oo). Then the conclusion follows by noting that 
the functional h i— » sup 0<t<c \h\ is con tinuous in the Skorohod top ology as long as 
c is not a point of discontinuity of h (jJacod and Shirvaevl (|2003l . Proposition VI 
2.4, page 339)). Since N(f ) is stochastically continuous (i.e. P(N(t 7 ) -N^ 7 -) > 
0) = for each fixed t > 0), t 1 ~ 7 /i 7 (t)/7 is almost surely continuous at c. 

It remains to prove convergence of the ratio. Fix K > c, and again we may 
assume that K is a continuity point. Consider first the term in the denominator, 
na n fo(a n t): it follows from (|1.4p that 

S„(t) = {na n f (a n t))- 1 -» 7 - 1 t 1 -7 = <?(t) 



2(> 
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where g is monotone increasing and uniformly continuous on [0, K\. Thus g n — > g 
in C[0,if]. Since the term in the numerator satisfies h n (t) = na n f n {a n t) => 
/i 7 (i) = h{t) in D[0,K], it follows that g n h n => gh in D[0, K], as required. Here, 
we have again used the continuity of the supremum. This completes the proof 
of(ii). □ 

Before proving Corollaries 11.21 - 11.41 we state the following lemma. 

Lemma 5.4. Suppose that a n = p(l/n) for some function with p(0) = satisfy- 
ing lim x _ >0+ p'(x)f (p(x)) = 1. Then nF (a n ) -» 1. 

Proof: This follows easily from PHopital's rule, since 



lim nF (a n ) 



lim 



lim fo(p(x))p'(x) 

x— >0+ 



□ 



Proof of Corollary II. 2t Under the assumption GO we see that Fq(x) ~ 
/o(0+)x as x — > 0, so (jl.5p holds with 7 = 1. The claim that a n = l/(n/o(0+)) 
satisfies nFo(a n ) — > 1 follows from Lemma l5.4l with p(x) = x/fo(0+). For (i) note 
that hi(0) = h-\ (0+) = su p^^n(N(t)/t), and the indicated equality in distribution 



follows from 



Pvkei (|1959l ); see Proposition 11.51 and its proof, (ii) follows directly 



from (i) of Theorem II. 11 To prove (iii), note that from (ii) of Theorem 11.11 it 
suffices to show that 



(5.34) sup 

0<t<c 



h!(t)-l = /ii(0+)-l = fci(0+)-l = Yi-l 



for each c > where h\ (t) is the right derivative of the LCM of N(i). The equality 
in (|5.34p holds if h\{c) > 1, since hi is decreasing by definition. By the switching 
relation (|2,12p . we have the equivalence 

{h 1 (c)>l} = {s L (l)>c}. 

The equality in (|5.34p thus follows ifs L (l) = 00. That is, if 

N(t) - t < sup{f%) - y} for all finite t. 
y>o 

Let W = sup y > {N(y) - y}. IPvkej (|1959l . pages 570-571) showed that P(W < 
x) = for x > 0; i.e. P{W = 00) = 1. □ 

Proof of Corollary II. 3t Under the assumption Gl we see that Fq(x) ~ 
Cix(log(l/x))^ as x —> 0, so fjl .5j) holds with 7 = 1. The claim that a n = 
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l/(Cin(log n)P) satisfies nFo(a n ) — » 1 follows from Lemma 15.41 with p(x) = 
ac/(Cilog(l/a;))^. For (i) note that /^(O) = hi(0+) = sup t>0 (N(t)/t) just as in 
the proof of Corollary 11.21 (ii) again follows directly from (i) of Theorem 11.11 
and the proof of (hi) is just the same as in the proof of Corollary 11.21 □ 

Proof of Corollary 11.41 Under the assumption G2 we see that Fq(x) ~ 
C2X 1 ~ a / (1 — a) as x — > 0, so (|1.5p holds with 7 = 1 — a. The claim that 
a n = {(1 — a) / (nC2)} 1 ^ 1 ~ a ^ satisfies nFQ(a n ) — * 1 follows from Lemma 15.41 with 
p(x) = ((1 - a)x/C 2 ) l/{1 ~ a) - For (i) note that 

Va(0) =£i_a(0+) = SUp(N(t 1 -°)/t) =SUp(N(s)/s 1 /( 1 - Q )) 

t>0 s>0 

much as in the proof of Corollary 11.21 (ii) and (iii) follow directly from (i) and 
(ii) of Theorem 11.11 □ 

Proof of P roposition 11.51 The part of the proposition with 7 = 1 follows from 



vkd (119591 . pages 570-571); this is closely related to a clas sical result of 



1945) for the empirical distribution function; see e.g. 



Daniels 



Shorack and Wellnei 



1983, Theorem 9.1.2, page 345). 



The proof for the case 7 < 1 proceeds much along the lines of iMasonl (|1983l . 
pages 103-105). Fix x > and 7 < 1. We aim at establishing an expression for 
the distribution function of = sup s>0 (N(s)/s 1 / 7 ) at x > 0. First, observe that 

P(Y,<x) = p( SU p(^^<r 



(5.35) = P(N(t) < U(t) for all t > 0) 

where the function U(t) = xt 1 ^ . For j € N let tj := (j/x) 7 , and note that 
ti <t 2 < ... and U (tj) = j. 
Define sets B and C by 

B = [N(t k ) / k ; for all k > 1] and C = [N(s) > U(s) ; for some s > 0]. 

Then P(B n C) = as a consequence of the following argument: Suppose that 
there exists some t > and k £ N such that k = N(t) > U (t) and N(tj) 7^ i, for all 
i > 1. It then follows that ^ > t, for otherwise it follows that k = U(t^) < C/(t), 
as [/(•) is increasing, which is a contradiction. Therefore, > t implies that 
N(tfc) > N(i) = fc, as N(-) is non-decreasing while N(i^) = k is disallowed, 
by hypothesis. Hence, N(ij) > i holds true for all i > k, for otherwise there 
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would exist some j > k such that N(tj) = j, since N(-) is a counting process. 
Therefore, for each i > k we have that N(s) > i + 1 holds for all U < s < t{ + \ 
and, consequently, that N(s) > C/(s) holds for all s > t k . This implies that 
BnC C [liminf s ^ 00 {N(s)/s 1 /T} > x] and therefore P(B n C) = 0, since the 
SLLN implies that N(s)/s 1 / 7 — > holds almost surely, for fixed 7 < 1. We thus 
conclude that P(B fl C) = 0. 

We conclude that P(C) = P(C fl B c ). Furthermore, since U is a strictly 
increasing function, and since N has jumps at the points {t k } with probability 
zero, we also find that P(C D B c ) = P(B C ). Finally, partition B c as B c = 
^t =1 A k for the disjoint sets A k = [N(t k ) = k,N(tj) / j for all 1 < j < k], k > 1. 
Combining all arguments above, we conclude that 

00 

P(y 7 < x) = 1 - P(C) = i-J2 

k=l 

where P{Ai) = P(N(ti) = 1) = p(h; 1), and, for k > 2, P(A fc ) may be written 
as 

P(N(t fc ) = k) - P({N(t fc ) = k}n {N(u) + i,i< k} c ) 

k-l 

= P(N(t fc ) = fc) - £ P(N(* fc ) = fc, N(tj) = j, N(*i) 7^ i, * < i) 
i=i 
fe-i 

= P(N(t fc ) = fc) - £P(N(* fc ) ~ N(tj) = fe — j)P(N(^) = j,N(ti) / i, i < j). 
i=i 

The result follows. □ 

Appendix B: Definitions from Convex Analysis 

The epigraph (hypograph) of a function / from a subset S of M. d to [—00, +00] 
is the subset epi(/) (hypo(/)) of R d+1 defined by 

epi(/) = {(x,t): x€S,t€R,t>f(x)}, 
hypo(/) = {(i,J):^S,iGK;K/(4 

The function / is convex if epi(/) is a convex set. The effective domain of a 
convex function / on S is 

dom(/) = {x£R !l : (x,t) <G epi(/) for some t} = {x € R d : f(x) < 00}. 
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The t— sublevel set of a convex function / is the set Ct = {x G dom(/) : 
f(x) < t}, and the t— superlevel set of a concave function g is the set St = {x G 
dom(g) : g(x) > t}. The sets Ct, St are convex. The convex hull of a set S C R d , 
denoted by conv(S), is the intersection of all the convex sets containing S. 

A convex function / is said to be proper if its epigraph is non-empty and 
contains no vertical lines; i.e. if f{x) < +00 for at least one x and f(x) > — 00 
for every x. Similarly, a concave function g is proper if the convex function —g is 
proper. The closure of a concave function g, denoted by cl(g), is the pointwise 
infimum of all affine functions h > g. If g is proper, then 

d(g)(x) = limsup5(y). 

For every proper convex function / there exists closed proper convex function 
cl(/) such that epi(cl(/)) = cl(epi(/)). The conjugate function g* of a concave 
function g is defined by 

g*(y) = inf{(x, y) - g(x) : x G R d }, 

and the conjugate function /* of a convex function / is defined by 

r(y)=sup{(x,y)-f(x): x G R d }. 

If g is concave, then / = — g is convex and / has conjugate f*(y) = —g*(—y)- 
A complete non- decreasing curve is a subset of M 2 of the form 

r = {(x,y) : x G M, y G R, ip-(x) <y< ip+{x)} 

for some non-decreasing function ip from R to [— 00, +00] which is not everywhere 
infinite. Here (p+ and (f- denote the right and left continuous versions of (p 
respectively. A vector y G R d is said to be a subgradient of a convex function / 
at a point x if 

f(z) > f{x) + {y,z- z) for all z G R d . 

The set of all subgradients of / at x is called the subdifferential of f at x, and 
is denoted by df(x). 

A face of a convex set C is a convex subset B of C such that every closed 
line segment in C with a relative interior point in B has both endpoints in B. 
If B is the set of points where a linear function h achieves its maximum over C, 
then B is a face of C. If the maximum is achieved on the relative interior of a 
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line segment L C C, then h must be constant on L and L C B. A face B of this 
type is called an exposed face. 
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